We all know the well-known VLOOKUP() function that helps us combine data from different tables. However, this function has one significant drawback - it cannot combine similar values, that is, if there is an error in the word, then there will be no match.
To be able to combine approximate values, we can create our own function. Let's call it FuzzyLookup().
Let's imagine that we have two lists. Both have approximately the same elements, but they may be written slightly differently. The task is to find for each element in the first list the most similar element from the second list, i.e. implement a search for the nearest maximally similar text.
The big question, in this case, is what to consider the criterion of "similarity". Just the number of matching characters? Is the number of consecutive matches? Should character case or spaces be considered? What to do with different arrangement of words in a phrase? There are many options and there is no single solution - for each situation one or the other will be better than others.
In our case, we implement the simplest option - search by the maximum number of character matches. It's not perfect, but it works pretty well for most situations.
To add function FuzzyLookup , open the menu Tools - Macros - Edit Macros... , select Module1 and copy the following text into the module:
Function FuzzyLOOKUP(LookupValue As String, SrcTable As Variant, Optional SimThreshold As Single) As String ' moonexcel.com.ua Dim Str As String Dim CellArray As Variant Dim StrArray As Variant If IsMissing(SimThreshold) Then SimThreshold = 0 Str = LCase(LookupValue) StrArray = Split(Str) StrExt = UBound(StrArray) For Each Cell In SrcTable CellArray = Split(LCase(Cell)) CellExt = UBound(CellArray) CellRate = 0 ' We check each word in the search phrase For x = 0 To StrExt StrWord = StrArray(x) If Len(StrWord) = 0 Then GoTo continue_x MaxStrWordRate = 0 ' We check each word in the next cell from the original table of values For i = 0 To CellExt CellWord = CellArray(i) If Len(CellWord) = 0 Then GoTo continue_i FindCharNum = OccurrenceNum(StrWord, CellWord) StrWordRate = FindCharNum / Max(Len(StrWord),Len(CellWord)) If StrWordRate > MaxStrWordRate Then MaxStrWordRate = StrWordRate continue_i: Next i CellRate = CellRate + MaxStrWordRate continue_x: Next x ' We keep the best match If CellRate > MaxCellRate Then MaxCellRate = CellRate BestCell = Cell FindCharNum = OccurrenceNum(Str, Cell) SimRate = FindCharNum / Max(Len(Str),Len(Cell)) End If Next Cell IF SimRate >= SimThreshold Then IF SimThreshold = -1 Then ReturnValue = BestCell + " (" + Format(SimRate, "0.00") + ")" ElseIf SimThreshold = -2 Then ReturnValue = Format(SimRate, "0.00") Else ReturnValue = BestCell End If Else ReturnValue = "" End If FuzzyLOOKUP = ReturnValue End Function Function OccurrenceNum(ByVal SourceString As String, ByVal TargetString As String) For i = 1 To Len(SourceString) ' We are looking for the occurrence of each symbol Position = InStr(1, TargetString, Mid(SourceString, i, 1), 1) ' We increase the counter of coincidences If Position > 0 Then Count = Count + 1 ' Remove the found symbol TargetString = Left(TargetString, Position - 1) + Right(TargetString, Len(TargetString) - Position) End If Next i OccurrenceNum = Count End Function Function Max(ByVal value1 As Variant, ByVal value2 As Variant) If value1 > value2 Then Result = value1 Else Result = value2 End If Max = Result End Function
Next, close Macro Editor and return to the worksheet LibreOffice Calc - now you can use our new feature FuzzyLookup() .
You can also use the feature FUZZYLOOKUP() by installing the free extension YouLibreCalc.oxt or its full-featured version YLC_Utilities.oxt .
After that, this function will be available in all files that will be opened in LibreOffice Calc.